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Abstract 

Theory suggests that biological modularity and robustness allow for maintenance of fitness under mutational change, and 
when this change is adaptive, for evolvability. Empirical demonstrations that these traits promote evolvability in nature 
remain scant however. This is in part because modularity, robustness, and evolvability are difficult to define and measure in 
real biological systems. Here, we address whether structural modularity and/or robustness confer evolvability at the level of 
proteins by looking for associations between indices of protein structural modularity, structural robustness, and evolvability. 
We propose a novel index for protein structural modularity: the number of regular secondary structure elements (helices and 
strands) divided by the number of residues in the structure. We index protein evolvability as the proportion of sites with 
evidence of being under positive selection multiplied by the average rate of adaptive evolution at these sites, and we 
measure this as an average over a phylogeny of 25 mammalian species. We use contact density as an index of protein 
designability, and thus, structural robustness. We find that protein evolvability is positively associated with structural 
modularity as well as structural robustness and that the effect of structural modularity on evolvability is independent of 
the structural robustness index. We interpret these associations to be the result of reduced constraints on amino acid 
substitutions in highly modular and robust protein structures, which results in faster adaptation through natural selection. 
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Introduction 

The extensive robustness of biological systems has long 
fascinated biologists. Robustness can be defined as the 
tendency for a system to maintain functionality under per- 
turbation. Here, we will specifically concern ourselves with 
robustness under mutational perturbation because it is 
heritable change that is most immediately relevant to evolv- 
ability, which is the ability to respond to positive selection 
(Wagner and Altenberg 1996; Pigliucci 2008; Wagner 
2008). Although robustness can, in theory, stifle adaptation 
under certain circumstances (i.e., under "neutral confine- 
ment" where mutational change does not cause significant 
phenotypic change) (Ancel and Fontana 2000; Sumedha 
et al. 2007; Cowperthwaite et al. 2008; Draghi et al. 
2010), it generally confers evolvability to living systems 
because it allows them to undergo innovative modification 
without losing functionality (i.e., because adaptation is not 



typically limited by the extent to it can change via mutation, 
but rather, by the extent to which mutational change creates 
useful, nonlethal phenotypic variation) (Wagner 2005; Wag- 
ner 2008). Robustness also serves to maintain high fitness 
under conditions of random genetic and environmental 
noise (Wagner et al. 1 997; Gibson and Wagner 2000; Mei- 
klejohn and Hartl 2002; Wagner 2005). Modularity— which 
we define as the clustering of epistatic interactions — is an 
important form of robustness because it limits the number 
of system components that are affected by a given pertur- 
bation (Wagner and Altenberg 1996; Wagner 1996; Ancel 
and Fontana 2000; Fontana 2002; Kitano 2004; Bhattachar- 
yya et al. 2006; Wagner et al. 2007). Diverse hypotheses for 
the origin of modularity have been proposed, and there has 
yet to be agreement on a final answer to this question (Lip- 
son et al. 2002; Gardner and Zuidema 2003; Force et al. 
2005; Misevic et al. 2006; Lynch 2007; Wagner et al. 
2007). Similar to the case of robustness, while modularity 
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can be linked to reduced evolvability in some specific scenar- 
ios (Ancel and Fontana 2000; Hansen 2002; Griswold 2006), 
the consensus is that it generally facilitates adaptive change 
(Wagner and Altenberg 1996; Gerhartand Kirschner 1997; 
Bogarad and Deem 1 999; Hartwell et al. 1 999; Yang 2001 ; 
Cui et al. 2002; Xia and Levitt 2002; Beldade and Brakefield 
2003; Bhattacharyya et al. 2006; Chen and Dokholyan 
2006; Franz-Odendaal and Hall 2006; Pereira-Leal et al. 
2006). Being a form of robustness, we can make the pre- 
diction that modularity should confer evolvability to a sys- 
tem. The goal of this study is to test this prediction, as 
well as the predicted connection between robustness and 
evolvability, at the level of proteins. To do this, we look 
for whether indices of protein structural modularity and pro- 
tein structural robustness correlate with a protein evolvabil- 
ity index. 

Indexing modularity in biological systems is not a simple 
task, despite the fact that biological systems — and proteins 
in particular — are nonrandomly modular (Schlosser and 
Wagner 2004; del Sol et al. 2009) and that modularity seems 
to increase through evolutionary time (Bonner 1988). In this 
study, we measure protein structural modularity by assessing 
the density of helices and (3-sheet strands. Protein structural 
robustness can be indexed via contact density (England and 
Shakhnovich 2003), which is the average number of contacts 
a residue makes with other residues in the protein structure. It 
correlates strongly with protein designability, which is the 
number of protein sequences that stably fold into a given 
structure. Designability is an important predictor of the num- 
ber of mutations a structure can tolerate, and so it is a good 
indicator of protein mutational robustness for relatively struc- 
tured proteins (Li et al. 1996). It also seems to be important 
for the maintenance of stability and folding over the course of 
long periods of protein evolution (Govindarajan and 
Goldstein 1997; Taverna and Goldstein 2000). 

We assess structural modularity and robustness indices, 
and an index of protein evolvability, for a data set of 1 67 mam- 
malian proteins with empirically determined tertiary structures 
in order to look for an association between protein evolvability 
and either protein structural modularity or robustness. We 
find a positive association between our protein evolvability 
index and both structural modularity and robustness. 

Materials and Methods 

Our experimental approach is to test whether proteins with 
high indices of evolvability are more structurally modular 
and/or robust than proteins with lower evolvability. Our data 
set consists of orthologous genes that code for proteins with 
solved tertiary structures. For each protein in the data set, 
we obtain measures of structural modularity, structural 
robustness, and evolvability. 

The structural robustness index used here is contact 
density. In the context of relatively structured proteins 



(e.g., the data set used in this study), it can be assumed that 
the native fold is essential for function, so we can define 
robustness more specifically than we did in the Introduction. 
In this context, protein robustness is the ability for a protein 
sequence to maintain its native structure under mutational 
perturbation. Contact density is the average number of con- 
tacts an amino acid makes with other amino acids in the 
protein (England and Shakhnovich 2003). It has been shown 
to correlate with designability (England et al. 2003; England 
and Shakhnovich 2003; Bloom et al. 2006), which is the 
number of sequences that stably fold into a given structure 
and which is an important determinant of protein muta- 
tional robustness (Li et al. 1996; Bloom et al. 2005). Desig- 
nability determines the rate at which stable folding becomes 
less likely as random mutations accumulate (Bloom et al. 
2005; Wilke et al. 2005). High contact density implies many 
energetically favorable placements of strongly interacting 
amino acids, which relax energy constraints on the rest of 
the structure, thus allowing more sequences to fold into 
the structure (England and Shakhnovich 2003). We deter- 
mine contact density using one of the standard methods 
(e.g., see Shakhnovich et al. 2005): we divide the trace 
(i.e., the sum of the elements on the main diagonal) of 
the square of the contact matrix by the number of residues 
in the protein structure. A contact matrix is calculated from 
the atomic coordinates of a protein database (PDB) structure 
file. We use the Euclidean distances between ot-carbons to 
construct a distance matrix D. Using a threshold of 8 A to 
define "contact," and excluding trivial contacts (defined as 
those between residues that are separated by fewer than 
two intervening residues in the sequence), we convert D 
to a Boolean contact matrix C, where 1 represents "contact" 
and 0 represents "no contact." Contact density is the trace 
of the square of C, divided by the number or residues in the 
protein: Tr(C 2 )/A/. Our specific methodological choices rep- 
resent a compromise between the methods of H. Liao et al. 
(2005) who use oc-carbons and a contact threshold of 9 A, 
Shakhnovich et al. (2005) who use (3-carbons and a thresh- 
old of 7.5 A, and Bau et al. (2006) who use a-carbons and 
a threshold of 8 A. 

Since we are hoping to examine the relationship between 
modularity and evolvability, we would ideally measure pro- 
tein modularity in a way that reflects the extent of evolution- 
ary constraint. With our modularity measure, we aim to 
approximate the extent to which pleiotropic effects are re- 
stricted in the 3D space of the structure. A protein's inde- 
pendent units of evolutionary change (between which 
there are few pleiotropic effects) can be approximated 
through kinetic, thermodynamic, and/or functional modules 
(for a structural/folding perspective, see Copley et al. 2002 
and for a functional perspective, Bhattacharyya et al. 2006). 
Here, we use structural modules as our approximation, 
which simply assumes that the constraining antagonistic 
pleiotropic effects of adaptive mutations are primarily due 
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to requirements for folding and stability. This is very likely the 
case for most of the proteins in this data set because they all 
have solved tertiary structures and so are biased to having 
rather rigid structures. For the reasons discussed below, we 
interpret helices (including a, 3 10 , and n helices) and p-sheet 
strands as structural modules. We can thus approximate the 
overall density of functional modules in a protein by simply 
dividing the number of helices and strands (defined accord- 
ing to the Dictionary of Protein Secondary Structure; Kabsch 
and Sander 1 983) by the number of residues in the protein 
structure. We call this index "helix/strand density." 

Although secondary structure elements are likely the 
smallest units that have some degree of evolutionary inde- 
pendence, it is surely the case that completely independent 
protein structural modules are generally larger than individ- 
ual helices and strands. However, there are many reasons to 
believe that the majority of epistatic interactions between 
amino acids are highly localized within the 3D space of 
the protein structure. For one thing, the fundamental units 
of folding, function, and structure are clearly smaller than 
domains (del Sol and Carbonell 2007; Akiva et al. 2008; 
Laborde et al. 2008; Trifonov and Fenkel 2009). Evolution- 
ary independent motifs are also known to be very small 
(75% of them are between 10 and 40 residues; Su et al. 
2005). Specific cases of particularly modular and evolvable 
protein domains also suggest that secondary structure ele- 
ments are good descriptions of evolutionarily independent 
modules: The Duffy-binding-like domain is perhaps one of 
the most versatile and polymorphic protein domains in 
nature, and its ten semiconserved and evolutionarily inde- 
pendent sequence blocks have been shown to correspond 
near-perfectly with individual secondary structure elements 
(Howell et al. 2006). It was also shown that energetically 
independent modules have a mean size of 12 amino acids 
(Krishnan et al. 2007) and this corresponds well to the 
length of modules in our data set: while the mean length 
of individual secondary structure elements in our data set 
is 7.39 (standard deviation of the mean = 0.410), which 
is similar to previous calculations from empirical data 
(e.g., Sreerama et al. 1999), when mean module length 
is determined by dividing the full length of the protein struc- 
ture (i.e., including both structured and unstructured sites), 
the mean module length is 13.8 (standard deviation of the 
mean = 0.504). Furthermore, Krishnan et al. demonstrate 
that energetically optimal modules correspond to single sec- 
ondary structure elements until they reach about 30 amino 
acids in length (at which point energetically optimal mod- 
ules of this length or longer are rare) (Krishnan et al. 
2007). Finally, Emmert-Streib and Mushegian (2007) employ 
a method for domain identification that uses secondary 
structure elements as the fundamental units of structure, 
and they find that it performs equally well to more compli- 
cated analyses that include more detailed considerations of 
protein geometry and structure. This implies that secondary 



structure elements are the "main level at which protein do- 
mains attain their evolutionary optimal structural design" 
(Emmert-Streib and Mushegian 2007) and thus, that they 
offer a decomposition reflecting the protein's genuine 
epistatic architecture. 

Another reason helices and strands are an appropriate 
choice is because the exact number of them within the pro- 
tein structure can be easily and accurately ascertained from 
basic structural information. The small size of these struc- 
tural modules also makes them more useful for constructing 
an informative modularity index because the number of 
them per protein structure is far more variable, and thus in- 
formative, than the number of larger entities, such as do- 
mains. For the above reasons, we think that helices and 
P-sheet strands provide the best description of protein mod- 
ules that can be reliably determined for a large data set of 
proteins. 

In the design of our structural modularity index, we 
choose to divide the number of modules by the total num- 
ber of residues in the structure, as opposed to just the 
number of structured sites (i.e., those within helices or 
strands). Because all the proteins in our data set have solved 
tertiary structures, they are already biased to having a high 
proportion of structured sites, so our choice may be of little 
consequence. However, we make this choice because it is 
more conservative than the alternative. It is not clear that 
"unstructured" loop regions are free from all structural con- 
straints on adaptation (e.g., Regad et al. 2010), so we 
choose to divide by the total number of residues to prevent 
any possibility of biasing our modularity measure to higher 
values in proteins with high proportions of unstructured 
sites. This type of bias could cause a problem for the inter- 
pretation of our results, since Ridout et al. (2010) find that 
the fraction of unstructured sites correlates with evolution- 
ary rate. Though we do not find an association between the 
percentage of unstructured sites and evolvability in our data 
set, we design our modularity index so that we only risk 
reducing modularity measures in proteins with high frac- 
tions of unstructured sites. This assures that any bias in 
the index would only contribute to a negative correlation 
between structural modularity and evolvability, assuring that 
any positive correlation we detect would be a biologically 
meaningful signal. 

Our evolvability index is an attempt to measure the extent 
to which positive selection, as compared to negative and 
neutral selection, determines protein sequence evolution. 
It measures the overall amount of adaptive evolution a pro- 
tein experiences through its evolutionary history. It is a func- 
tion of both the underlying constraints on adaptation and 
the extent to which the protein is exposed to forces of pos- 
itive selection. Thus, it is more accurate to think of this as 
an index of realized evolvability. For example, even under 
strong positive selection, high structural and functional con- 
straints can cause this index to be low, and in this sense, 
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it gauges the level of adaptive constraint. At the same time, 
however, this index will be low if there are low levels of 
positive selection — even when amino acid substitutions 
are unconstrained and free to evolve independently. For this 
reason, it is important to measure this index as an average 
over a large species tree because we aim to capture the 
long-term evolvability of the protein structure, in a range 
of contexts, rather than the particular selection pressures 
that may exist during the divergence of any two species. Be- 
cause in this study we do measure the index as an average 
over a large species tree, we will not qualify it each time as 
an index for "realized evolvability." It will instead just be 
called an index for "evolvability." 

Our evolvability index is the proportion of sites with ev- 
idence of being under positive selection multiplied by the 
average rate of adaptive evolution at these sites. Estimates 
for these numbers are obtained by analyzing the evolution- 
ary history of each protein. For each of the proteins in the 
data set, a site model implemented by Phylogenetic Analysis 
by Maximum Likelihood (PAML) 3.15 codeml (Yang 1997, 
2007) is used to analyze 25 mammalian orthologs mapped 
to a known species phylogeny (fig. 1). Parameters seqfile, 
outfile, and treefile were specified appropriately, and other 



parameters were set as follows: verbose = 1, seqtype = 1, 
CodonFreq = 2, aaDist = 0, model = 0, NSsites = 3, ncatG 
= 3, icode = 0, RateAncestor = 1 , clock = 0, cleandata = 0, 
method = 0 (with all additional parameters being set to the 
codeml default settings, as described in the 2009 PAML 
version 4.3 user guide. From this analysis, we obtain the 
maximum likelihood estimates of the proportions of sites 
(p 0 , pi, and p 2 ) in each of three ce> categories (a> 0f <*> 1# and 
G) 2 ), and the a> values themselves (where co 0 is constrained 
to be <1, is constrained to be <1, and a> 2 is left uncon- 
strained) (go = c/ N /c/ s = the ratio of the nonsynonymous sub- 
stitution rate [c/ N ] to the synonymous substitution rate [d s ]\ 
We define the proportion of sites with evidence of being un- 
der positive selection as p 2 , and the rate of adaptive evolution 
at these sites as (d 2 - 1, so our evolvability index is p 2 (a> 2 - 1). 

This evolvability index is importantly different from pro- 
tein evolutionary rate indices used in many comparative 
studies (e.g., Bustamante et al. 2000; Fraser et al. 2002; 
Bloom and Adami 2003, 2004; Drummond et al. 2005; 
Herbeck and Wall 2005; Bloom et al. 2006; Chen and 
Dokholyan 2006; Lin et al. 2007). At least in theory (see 
qualification below), our index specifically measures the 
rate of substitutions that occur through positive selection. 

Ornithorhynchus 



Monodelphis 

i Dasypus 

I I Loxodonta 

' Echinops 

i Myotis 

I i Bos 

i Equus 

I Canis 

' Felis 

I Sorex 

' Erinaceus 

i Spermophilus 

| i Cavia 

I Rattus 

' Mus 

I Oryctolagus 

' ' Ochotona 

I Tupaia 

I Otolemur 

' Microcebus 

i Macaca 

i Pongo 

I Homo 

' Pan 

1.0 

Fig. 1. — The species phylogeny for the 25 mammalian species that are represented in the OrthoMaM database as of February 2009 (Ranwez et al. 
2007). 
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Conventional evolutionary rate indices take into account all 
types of substitutions and as a consequence (because neu- 
tral substitutions are so much more common than adaptive 
ones), they primarily reflect rates of neutral change. Because 
the ease with which a protein accommodates adaptive 
amino acid substitutions may not be directly related to 
the ease with which it accommodates neutral amino acid 
substitutions, if evolvability is defined as the ability to re- 
spond to positive selection (Wagner and Altenberg 1996; 
Pigliucci 2008; Wagner 2008), conventional evolutionary 
rate indices cannot serve as evolvability indices. Our evolv- 
ability index does, however, have one important weakness: 
For proteins without a class of sites under consistent and 
strong positive selection, it is possible that this index will 
overestimate the true level of adaptive evolution and be less 
negative than it should be, because when there are sites un- 
der significantly different levels of negative selection, it is 
possible for some sites to be identified as members of a third 
site class (i.e., an additional site class beyond those evolving 
primarily under neutral evolution and some specific level of 
negative selection) even when they do not experience sig- 
nificant positive selection. On the other hand, it should also 
be noted that an a> value below 1 (which would give an 
evolvability index below 0) does not imply that there are 
no sites under positive selection. It simply indicates that 
the average evolutionary rate across all branches of the phy- 
logeny (fig. 1) is below 1 and thus dominated by negative 
selection. Thus, the assumption made here is that most pro- 
teins have at least some sites under positive selection on at 
least some branches of the mammalian tree and that this 
positive selection is significant enough that it is what gen- 
erally defines the third class of sites with its distinct a> (as 
opposed to this being determined by the existence of 
two distinct rates of negative selection acting on different 
residues in the protein). 

We obtain indices for structural modularity, structural ro- 
bustness, and evolvability for 167 distinct proteins within 
the OrthoMaM database (Ranwez et al. 2007, accessed Feb- 
ruary 2009). This data set consists of all the proteins for 
which there is sufficient structural information to determine 
contact density and helix/strand density and for which 
orthologs of all 25 species are available. In this study, we 
limit our investigation to proteins from the same clade to 
eliminate potential confounding effects due to differences 
in phylogenetic structure between protein families from dif- 
ferent groups. The data set is broken up into categories 
based on the broadest hierarchical Gene Ontology catego- 
ries for molecular function (The Gene Ontology Consortium 
2000), according to AmiGO version 1 .7 (using the GO da- 
tabase release from 08 May 2010, Carbon et al. 2009). 
Within the data set, there are 1 55 proteins that have binding 
activity, 87 that have catalytic activity, 25 that have molec- 
ular transducer activity, 24 that have transcriptional regula- 
tory activity, 1 6 that have enzyme regulatory activity, 6 that 



have transporter activity, 5 that have structural molecule ac- 
tivity, 1 that has electron carrier activity, and 5 with no 
known molecular function. The average values for contact 
density, helix/strand density, and the evolvability index are 
assessed for each of the eight molecular function subsets 
that have a sample size larger than 1 . The data set is also 
broken up into two subsets according to whether the frac- 
tion of "structured" amino acids — that is, those that are part 
of a helix or strand — is relatively high or low. We also analyze 
the data set in three discrete subsets according to whether 
the secondary structure elements within the protein struc- 
ture comprise only helices, only strands, or both. 

To assess the nature of the relationship of protein evolv- 
ability to both structural modularity and robustness, we 
perform four tests. First, we perform a locally weighted poly- 
nomial fit to analyze the evolvability index as a function of 
structural modularity and robustness (with 0.5 of the data 
set used for each local fit). We carry out this analysis in R, 
using the "lowess" function. Second, each data set is divided 
into two equally sized groups according to the size of the 
evolvability index (dividing at the median value), and then 
Student's Mest, Welch's approximate f-test, and a Wilcoxon 
rank sum test are used to identify any significant difference 
between mean helix/strand density or contact density. We 
also compare the upper and lower third of the data set 
in a similar manner. Third, we perform Pearson's correlation 
and Spearman's rank correlation tests between the evolv- 
ability index and either contact density or modularity to look 
for any indication of an association between these two pairs 
of indices. Finally, our fourth test assess whether the vari- 
ance in the evolvability index is significantly different for pro- 
teins with high versus low modularity or robustness: the 
data set is divided into two equally sized groups according 
to the size of either helix/strand density or contact density 
(dividing at the median value), and an F ratio test is per- 
formed between the two halves of the data set. 

For the interpretation of our results, we rely on the as- 
sumption that different selection regime types are distrib- 
uted approximately randomly across different protein fold 
types — that is, that the structural modularity and robustness 
of a protein does not significantly influence the selective 
forces it experiences. We test this assumption by looking 
for an association between protein functional importance 
and either helix/strand density or contact density. We mea- 
sure functional importance by measuring the extent of neg- 
ative selection acting on the protein, which is defined here 
as p 0 (1 - (o 0 ). 

We perform multiple regression to tease apart the sepa- 
rate influences of helix/strand density and contact density on 
the evolvability index. We divide the data set at the median 
value for the evolvability index and analyze the two halves 
separately. We determine the quadratic best-fit functions 
while constraining the functions to be equal to the median 
evolvability value at the lowest observed levels of helix/ 
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strand density and contact density. We assess statistical sig- 
nificance of partial regression coefficients and compare the 
magnitude of standardized partial regression coefficients. 
We also perform a first-order polynomial multiple regression 
on the helix-only subset of the data set (since, as explained in 
Results, a linear fit was determined to be appropriate for this 
subset of the data). 

To exclude possible confounding factors, we consider 
whether some additional protein variables co-correlate with 
our index of protein evolvability and either protein structural 
modularity or robustness. Gene compactness is the domi- 
nant factor determining evolutionary rate in mammals, 
and gene essentiality is among the factors of secondary im- 
portance (Liao et al. 2006). To determine whether it is nec- 
essary to control for gene compactness when assessing the 
relationship between the evolvability index and either helix/ 
strand density or contact density, we test whether several 
compactness indices are significantly correlated with both 
the evolvability index and either helix/strand density or con- 
tact density. To determine whether it is necessary to control 
for gene essentiality, we assess whether there is a significant 
difference in the mean evolvability, structural modularity, or 
structural robustness index for essential versus nonessential 
proteins (i.e., those encoded by essential versus nonessential 
genes). 

Results 

In this study, we test whether there is an association in pro- 
teins between structural modularity, structural robustness, 
and evolvability. We gauge protein structural robustness 
by assessing contact density — the average number of con- 
tacts an amino acid makes with other amino acids in the 
protein. We gauge structural modularity by assessing he- 
lix/strand density, which is the number of regular secondary 
structure elements divided by the number of residues in 
a protein structure. Unlike contact density, helix/strand den- 
sity is not an established and well-studied index, so we test 
the basic assumption that underlies it: that the overall num- 
ber of helices and strands correlates with the number of res- 
idues in a protein (if this were not the case, normalizing for 
protein size by dividing by the number of residues in the pro- 
tein would over-correct for the influence of protein size). We 
find that there is a highly significant correlation between 
the number of residues and the number of helices and 
strands (fig. 2) and thus, that our normalization procedure 
is appropriate. 

As described in detail in Materials and Methods, we ob- 
tain indices for modularity, robustness, and evolvability for 
167 distinct mammalian proteins, each with orthologs f rom 
25 species. To assess whether protein structural modularity 
and robustness have an influence on protein evolvability, 
we test whether there is a positive association between 
the evolvability index and either structural modularity or 



robustness. The evolvability index is plotted as a function 
of both the structural robustness index (contact density) 
and the structural modularity index (helix/strand density) 
(fig. 3A, and B). The contact densities of the proteins in 
the data set have a mean value of 5.1 and a standard de- 
viation of 1 .0, the helix/strand densities have a mean of 
0.082, and a standard deviation of 0.023, and the evolvabil- 
ity indices have a mean of -0.0095 and a standard deviation 
of 0.066. 

The relationship between contact density and the evolv- 
ability index reveals two interesting and significant patterns. 
First, a general positive association between these two indices 
is apparent when we perform a locally weighted polynomial 
fit that provides a sliding window analysis of the relationship 
between the evolvability index and contact density (fig. 3Q. 
While it seems that the average evolvability index generally 
increases with increasing contact density, this analysis also re- 
veals that the relationship is not simple or linear. This general 
positive association between contact density and the evolv- 
ability index is also reflected in hypothesis test results: When 
the sample of proteins is divided into two equally sized groups 
according to their evolvability index (less-than-median vs. 
greater-than-median), we find that the mean contact density 
of the group with relatively high evolvability indices (5.30) is 
significantly greater than the mean contact density of the 
group with relatively low evolvability indices (4.94) (P = 
0.0101 for Student's f-test, P = 0.0100 for Welch's approx- 
imate Mest, and P = 0.0125 for Wilcoxon rank sum test with 
continuity correction, all one-tailed) (fig. 3A). Much of the 
data set is highly clustered with respect to the evolvability 
index, and very small differences can be of questionable 
biological relevance even when they are statistically signifi- 
cant. We therefore also compared the highest and lowest 
thirds of the data set with respect to the evolvability index 
and found that the mean contact density of these two smaller 
groups still differs significantly. Further evidence for the pos- 
itive association between contact density and the evolvability 
index is that these two indices have a borderline significant 
rank correlation (P = 0.0682 for Spearman's rank correlation 
test, one-tailed), though they do not have a significant 
linear correlation (P = 0.391 for Pearson's correlation test, 
one-tailed). 

The above patterns imply that proteins with higher evolv- 
ability indices are generally more designable and robust than 
proteins with lower evolvability indices. These patterns also 
prove to be even more pronounced when we look only at 
proteins that contain helices but no strands (fig. 4/\). Ana- 
lyzing this subset of data independently, we find a significant 
rank correlation between the indices but not a significant 
linear correlation (P = 0.1 19 for Pearson's correlation test, 
one-tailed; P = 0.01 62 for Spearman's rank correlation test, 
one-tailed). Also, when this subset of data is divided in half 
according to the median evolvability index, we find that the 
difference between the mean contact density is significant 
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Fig. 2. — (A) The trace of the square of the contact matrix "TrC2" as a function of the number of residues in the protein structure "N." Pearson correlation 
coefficient = 0.970. (B) The total number of helices and strands "SS" in a protein structure as a function of the number of amino acids in the protein structure 
"N." Pearson correlation coefficient = 0.966. (0 The evolvability index "DS" as a function of the number of amino acids in the protein structure "N." 
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Fig. 3. — (A) The evolvability index "DS" as a function of the structural robustness index (contact density) "D." Spearman's rank correlation 
coefficient = 0.1 16, one-tailed P = 0.0682. (B) The evolvability index "DS" as a function of the structural modularity index (helix/strand density) "M." 
Spearman's rank correlation coefficient = 0.1 64, one-tailed P = 0.01 69. {A, B) The color of the data points indicates whether they are part of the upper 
or lower half of the data set with respect to "DS" divided at the median. The mean "D" or "M" of the light green data points is indicated by the upper 
line, and the mean "D" or "M" of the dark blue data points is indicated by the lower line. Curves are best-fit parabolic functions without a constant 
basis and constraining "DS" to be equal to the median "DS" value for the lowest observed "D" or "M" value. Both fits are highly significant (P << 
0.0001 according to analysis of variance F statistic). (C, D) Sliding-window analysis (i.e., locally weighted polynomial regression) of the mean evolvability 
index "DS" as a function of contact density "D" (C) or helix/strand density "M" (D). The proportion of the data set used to fit each local polynomial is 0.5. 



(P = 0.0394 for Student's f-test, one-tailed; P = 0.0395 for 
Welch's approximate f-test, one-tailed; P = 0.0366 for Wil- 
coxon rank sum test with continuity correction, one-tailed) 
and greater than it is for other subsets of the data (fig. 5A) 
In addition to the above evidence for a positive associa- 
tion between the evolvability and structural robustness in- 
dices, we observe a second pattern between these 
indices: high contact density seems to be associated with 
greater variance in the evolvability index across different 
proteins (fig. 6A). Indeed, when the data set is divided into 
two equally sized groups according to contact density (di- 
viding at the median), the variance in the evolvability index 
is significantly greater for proteins with higher contact den- 
sity than for those with lower contact density (0.00164 vs. 
0.00718) (P « 0.0001 for F ratio test of null hypothesis 
that ratio between the variances is 1). Furthermore, this dif- 
ference in variance is not dependent on the outlying data 
points: if the two most outlying data points with respect 



to the evolvability index are removed from both halves of 
the data set, there is still a significant difference between 
the variances of the two halves. Thus, we observe an in- 
crease in the variance of the evolvability index as contact 
density increases. 

In order to analyze the relationship between structural 
modularity and evolvability in proteins, we perform the 
same tests as above, but this time for helix/strand density. 
We perform a local fit on the full data set to analyze the 
relationship between the evolvability index and helix/strand 
density (fig. 3D). As with contact density, this analysis reveals 
that the evolvability index generally increases with increas- 
ing helix/strand density. However, it also shows that the re- 
lationship between these two indices is not necessarily 
a simple linear one. 

We find that the mean helix/strand density of proteins 
with relatively high evolvability indices (0.0876) is signifi- 
cantly greater than the mean helix/strand density of proteins 
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Fig. 4. — Correlation and rank correlation analysis for a subset of 
the data set, consisting of proteins that contain helices, but no strands. 

(A) The evolvability index "DS" as a function of the structural robustness 
index (contact density) "D." The best-fit line is shown in red but is not 
statistically significant (Pearson's correlation coefficient = 0.215, one- 
tailed P = 0.1 18), but the rank correlation between DS and D is significant 
(Spearman's rank correlation coefficient = 0.380, one-tailed P = 0.0162). 

(B) The evolvability index "DS" as a function of the structural modularity 
index (helix/strand density) "M." The best-fit line is shown in red and 
is statistically significant (Pearson's correlation coefficient = 0.414, one- 
tailed P = 0.0092), and the correlation between "M" and "DS" is 
statistically significant even after correcting for the relationship between 
"D" and "DS". The rank correlation between the variables is also 
significant (Spearman's rank coefficient = 0.426, one-tailed P = 0.0748). 

with relatively low evolvability indices (0.0768) (P = 0.001 35 
for Student's f-test, one-tailed; P = 0.001 35 for Welch's ap- 
proximate f-test, one-tailed; P = 0.00324 for Wilcoxon rank 
sum test with continuity correction, one-tailed) (fig. 3B). The 
difference in mean helix/strand density between the highest 
and lowest thirds of the data set with respect to the evolv- 
ability index is also significant. These hypothesis tests con- 
firm the generally positive association between secondary 
structure density and the evolvability index observed in 
the local fit. Furthermore, while helix/strand density does 
not significantly correlate (i.e., linearly) with the evolvability 
index (P = 0.375 for Pearson's correlation test, one-tailed), 
it does significantly rank correlate with the evolvability 



index (P = 0.0169 for Spearman's rank correlation test, 
one-tailed). 

The evidence for a positive association between second- 
ary structure density and the evolvability index is stronger 
when we analyze the subset of data consisting of helix-only 
proteins (fig. AB). In this case, we find a significant linear 
correlation as well as rank correlation (P = 0.0092 for Pear- 
son's correlation test, one-tailed; P = 0.00748 for Spear- 
man's rank correlation test, one-tailed). Also, when this 
subset of the data is divided in half at the median evolvability 
index, the difference between the mean secondary structure 
density for the two halves of the data set is significant (P = 
0.00400 for Student's ttest, one-tailed; P = 0.00409 for 
Welch's approximate Mest, one-tailed; P = 0.00332 for Wil- 
coxon rank sum test with continuity correction, one-tailed) 
and greater than the difference between the means for 
other subsets of the data (fig. 5B). 

Finally, as in the case of structural robustness, the evolv- 
ability indices of proteins with relatively high structural mod- 
ularity are significantly more variable than those of proteins 
with relatively low structural modularity (0.00657 as com- 
pared with 0.00224; P « 0.0001 for F ratio test of null 
hypothesis that ratio between the variances is 1) (fig. 6B). 
This result holds regardless of whether outlying data points 
are included or not (the difference in variance is significant 
even if the two most outlying data points with respect to the 
evolvability index are removed from both halves of the data 
set). Together with the corresponding results for contact 
density, this implies that higher structural modularity and ro- 
bustness are associated with greater variance in realized 
protein evolvability. 

Because our indices for structural modularity and struc- 
tural robustness correlate with one another to some extent 
(fig. 7), the above results on their own do not clarify whether 
either of these indices have independent effects on protein 
evolvability. We therefore perform multiple regression to 
tease apart the separate influences of helix/strand density 
and contact density on the evolvability index. The full data 
set is divided at the median evolvability index, and the two 
halves are analyzed separately. Quadratic fits to both halves 
of the data set are highly significant (analysis of variance 
P « 0.0001). The estimates of the individual partial re- 
gression coefficients — the parameters that describe how 
helix/strand density and contact density independently in- 
fluence the evolvability index — were not significantly differ- 
ent from 0 in either case (Student's f-test). Thus, the relative 
statistical significance of the partial regression coefficients 
cannot be used to exclude either helix/strand density or 
contact density as a possible independent predictor of the 
evolvability index. We find that, for both halves of the data 
set, the standardized partial regression coefficient for helix/ 
strand density is nearly 1 00 times greater in magnitude than 
the standardized partial regression coefficient for contact 
density (regardless of the order in which the two variables 
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are added to the model), however, we cannot conclude any- 
thing from this because the partial regression coefficients 
are not significantly different from 0. 

As stated above, the helix-only proteins show a stronger 
pattern, and for this subset of the data, the relationships 
between helix/strand density, contact density, and evolvabil- 
ity can all be meaningfully approximated as linear (fig. 4). 
We therefore perform multiple regression on this subset 
of data that comprises helix-only proteins using a first-order 
polynomial fit. We find that helix/strand density is a signifi- 
cant contributor to the variation in the evolvability index 
even after controlling for the influence of contact density 
(the standardized partial regression coefficient for helix/ 
strand density is 0.387 and P = 0.0405), whereas this is 
not the case the other way around (the standardized partial 
regression coefficient for contact density is 0.0763 and P = 
0.675). Both the magnitude of the standardized partial re- 
gression coefficients and the difference in whether they are 
significant demonstrate that helix/strand density is more im- 
portant than contact density in determining the value of the 
protein evolvability index. 



In addition to analyzing helix-only proteins in isolation, 
we examined several other subsets of the data indepen- 
dently. Specifically, we separately analyzed the subset of 
proteins that contains strands and no helices, and the subset 
that contains a mixture of helices and strands. We also di- 
vided up the data set for separate analysis according to the 
molecular function of the proteins and according to 
whether they are "structured" or "unstructured" (see Mate- 
rials and Methods). We find that the mean helix/strand den- 
sity and contact density do differ between these various 
categories (fig. 8). We also find that contact density is neg- 
atively correlated with the fraction of unstructured sites and 
that structural modularity is positively correlated with the 
fraction of unstructured sites but that the evolvability index 
is not associated positively or negatively with the fraction of 
unstructured sites. We find no major incongruencies among 
these data subsets in regard to the relationship they reveal 
between the indices for modularity, robustness, and evolv- 
ability (figs. 5 and 9). However, these different subsets reveal 
the relationships between the indices to varying extents. As 
mentioned above, the helix-only category of proteins reveals 



Genome Biol. Evol. 3:456-475. doi:10.1093/gbe/evr046 Advance Access publication May 21, 201 1 



465 



Rorick and Wagner 



GBE 



0.2 



-0.2 - 



-0.4 



•(*)6 
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much stronger positive associations between the evolv- 
ability index and both contact density and helix/strand 
density (figs. 4 and 5). Conversely, the limitations of he- 
lix/strand density and contact density to act as indicators 
of the level of adaptive constraint is reflected in the fact 
that these patterns are considerably less pronounced 
for classes of proteins known to be highly unstructured 
(fig. 5) (Wright and Dyson 1999; Garza et al. 2009) and 
for the less structured half of the data set (figs. 5 and 
1 0). This implies that these structural indices fail to capture 



the relevant constraints on adaptation for unstructured 
proteins, as expected. 

Testing for Potential Confounding Factors 

To index evolvability in proteins, we measure the amount of 
adaptive evolution a protein experiences. As mentioned 
above, in using this index, we are assuming that high levels 
of evolution through positive selection can be attributed at 
least partially to low constraints on adaptation (i.e., high 
evolvability) as opposed to only high positive selection 
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Fig. 7. — Helix/strand density "M" as a function of contact density "D." 



0.12 



M 



pressure, and that the structural modularity or robustness 
of a protein does not significantly influence the selective 
forces it experiences. It is important that these assumptions 
are true because a confounding cause of our results would 
be that proteins with high structural modularity or robust- 
ness for some reason experience preferentially higher pos- 
itive selection pressure. To verify that this is not the case, 
we look for whether functional importance is associated 
with structural modularity or robustness. If functionally im- 
portant proteins — which we define to be those under 
strong negative selection — are generally more modular 
and robust than less important proteins, we would have 
to consider the possibility that our indices for structural 
modularity and robustness only correlate with evolvability 
due to recruitment of modular and robust folds into impor- 
tant functional roles or through gradual selection for in- 
creased modularity or robustness in important proteins 
(though this latter possibility is unlikely for reasons dis- 
cussed below). However, we do not find any association 
between the index for functional importance and either 
helix/strand density or contact density (supplementary 
fig. S1 , Supplementary Material online), so we reject these 
possible confounding causes of our results. 

According to a recent study by Ridout et al. (2010), un- 
structured sites (i.e., those which are not part of a regular 
secondary structure element) are more likely to have high cd 
values. This poses a possible alternative explanation for our 
observed association between structural modularity and 
evolvability indices (figs. 3B, 3D, and AB)\ i.e., that it is just 
a trivial consequence of there being a greater proportion of 
unstructured sites in highly modular proteins. This is espe- 
cially plausible since we also happen to find that proteins 
with higher proportions of unstructured sites (defined here 



as those not within a helix or strand) tend to have higher helix/ 
strand density (supplementary fig. S2, Supplementary Mate- 
rial online). However, we rule out this alternative interpreta- 
tion because our evolvability index shows no association with 
the proportion of unstructured sites (supplementary fig. S3, 
Supplementary Material online). 

Our index of structural robustness — contact density — has 
been previously shown to correlate with protein length 
(Lipman et al. 2002; Bloom et al. 2006), and we find this 
correlation in our data also (supplementary fig. S4, Supple- 
mentary Material online). To rule out the possibility that the 
association between contact density and the evolvability in- 
dex (figs. 3A, 3C, and 44) is caused by a co-correlation of 
both indices to protein length, we test for whether there is 
any relationship between evolvability and protein length. 
We find no significant correlation between these two 
variables (fig. 2C). Furthermore, when we divide the data 
set into two groups (one comprising those with less- 
than-median protein length and the other comprising those 
with greater-than-median protein length), we find no sig- 
nificant difference in the mean evolvability indices of these 
two groups. 

Liao et al. (2006) demonstrate that gene compactness 
and gene essentiality are both important determinants of 
the overall rate of mammalian protein evolution. To deter- 
mine whether it is necessary to control for gene compact- 
ness when examining the relationships between protein 
structural modularity, robustness, and evolvability, we test 
whether gene compactness indices co-correlate with the 
evolvability index and either protein structural modularity 
or robustness (supplementary figs. S5, S6, and S7, Supple- 
mentary Material online). We found no co-correlations and 
we find only two significant negative correlations among all 
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Fig. 8. — (A) The mean structural robustness index (contact density) for proteins of different fold and functional categories. (B) The mean structural 
modularity index (helix/strand density) for proteins of different fold and functional categories. UnsH, relatively unstructured half of data set; StrH, 
relatively structured half of data set; Bind, binding activity; Cat, catalytic activity; Trdsr, molecular transducer activity; TrsR, transcription regulator 
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molecular function specified; HeOnly, secondary structure elements consist of helices only; StOnly, secondary structure elements consist of strands only; 
HeSt, secondary structure elements consist of both helices and strands. Error bars show the standard error of the sample mean and are included where 
the sample size for the category is above 1 . 



the tests we perform — between CDS length and contact 
density and between CDS length and helix/strand density 
(before correcting for multiple tests, P « 0.001 and 
0.047, respectively). Because CDS length does not also neg- 
atively correlate with the evolvability index, we conclude 
that CDS length cannot be responsible for the observed as- 
sociations between the evolvability index and structural 
modularity and robustness. To determine whether it is nec- 



essary to control for gene essentiality, we assess whether 
there is a significant difference in helix/strand density, con- 
tact density, or the evolvability index in proteins correspond- 
ing to essential versus nonessential genes (supplementary 
fig. S8, Supplementary Material online). We find no signifi- 
cant differences among these comparisons (with the signif- 
icance cutoff set to P = 0.05 before correcting for multiple 
tests). Therefore, we conclude that gene essentiality is not 
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Fig. 9. — (A, B) Protein structural robustness "D" with proteins categorized by molecular function. The legend lists molecular functions in the order 
of their frequency in the data set, starting at the top with the most frequent. Where a protein has more than one molecular function, it is specified as 
the least frequent one. (A) The evolvability index "DS" as a function of the structural robustness index (contact density) "D." (B) Transformed evolvability 
index "TDS" as a function of "D". Transformation of "DS," by subtracting the median value and then taking the square root, allows for better 
visualization of the data. (C, D) Protein structural modularity "M" with proteins categorized by molecular function. The legend lists molecular functions 
in the order of their frequency in the data set, starting at the top with the most frequent. Where a protein has more than one molecular function, it is 
specified as the least frequent one. (0 The evolvability index "DS" as a function of the structural modularity index (helix/strand density) "M." (D) 
Transformed evolvability index "TDS" as a function of M. Transformation of "DS," by subtracting the median value and then taking the square root, 
allows for better visualization of the data. 
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likely to be a confounding factor in our analysis of the rela- 
tionship between structural modularity, structural robustness, 
and evolvability. 

Discussion 

From a theoretical standpoint, a system must be robust to be 
evolvable by natural selection. And yet, it remains unclear 
whether the ubiquity of robustness in nature can be ex- 
plained by selection for evolvability or whether it has evolved 
for the sake of buffering mutational and/or environmental 
noise (Hartl and Taubes 1 996; Wagner et al. 1 997; Ancel and 
Fontana 2000; Meiklejohn and Hartl 2002; de Visser et al. 
2003; Wagner 2005). Modularity is another characteristic of 



biological systems with obscure origins, and it is thought to 
contribute to robustness and evolvability (Wagner et al. 
2007). Investigation into the origins of modularity and ro- 
bustness is stymied by the fact that there is scant empirical 
evidence that they are biologically significant determinants 
of evolvability, probably because defining and measuring 
modularity and robustness in real biological systems remains 
problematic. Here, we use one established index of protein 
structural robustness (contact density as a measure of des- 
ignability) and another index of our own design (helix/strand 
density as a measure of structural modularity) to test 
whether robustness and modularity are associated with 
evolvability in proteins. Prior to this study, we knew little 
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Fig. 9. — (Continued) 



about the distribution of helix/strand density across different 
proteins, but previous work had already established that 
contact density is a determinant of protein family size 
(Shakhnovich et al. 2005), sequence diversity (Hartling 
and Kim 2008), functional diversity (Ferrada and Wagner 
2008), and overall evolutionary rate (c/ N ) in yeast (Bloom 
et al. 2006). These studies provide some indication that con- 
tact density contributes to reduced constraints and possibly 
evolvability. However, Bloom etal. (2006) could not fully dis- 
entangle the effects of contact density and protein length 
on d N , so it is possible that contact density only correlates 
with c/ N through co-correlation with protein length or some 
other unmeasured factor (such as modularity). Furthermore, 
these studies do not infer evolvability by measuring the 
amount of evolutionary change brought about through pos- 
itive selection, as we do here. Instead they use protein family 



size, c/ N , or functional or sequence diversity, which are all 
influenced by more factors than the two which contribute 
to our evolvability index (i.e., the extent of constraints on 
adaptation and positive selection strength). 

Protein Structural Modularity and Robustness and 
Their Effects on Protein Evolvability 

Here, we address whether structural modularity and robust- 
ness contribute to evolvability in proteins. We hypothesize 
that high values for either structural modularity or robust- 
ness should reflect low structural constraints and since these 
likely represent the dominant constraints in structured pro- 
teins, high evolvability. Therefore, if modularity and robust- 
ness confer evolvability in proteins — assuming different 
selection regimes are distributed approximately randomly 
among different protein folds — we expect to find a positive 
association between our index for protein evolvability and 
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Fig. 10. — The evolvability index "DS" as a function of {A) the protein structural robustness index (contact density) "D" and (B) the protein 
structural modularity index (helix/strand density) "M." The data set was divided into two equally sized groups according to a protein's proportion of 
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both helix/strand density and contact density. Indeed, this is 
what we find. Specifically, we find that 1 ) a sliding-window 
analysis, which fits distinct polynomials to local subsets of 
the data, reveals a generally positive association both 
between contact density and the evolvability index and 
between helix/strand density and the evolvability index 
(figs. 3 C # D) and that this pattern is stronger when we con- 
sider only the proteins containing helices and no strands 
(fig. 4); 2) the evolvability index is significantly rank corre- 
lated with the modularity index and, to a lesser extent, 
the robustness index (fig. 3) and that this is especially the 
case for helix-only proteins (fig. 4); and 3) multiple regres- 
sion analysis of helix-only proteins demonstrates that the 
correlation between the modularity and evolvability indices 
is independent of contact density and that this is not the 
case the other way around — implying that helix/strand den- 
sity is more important than contact density in determining 
the value of the evolvability index and that the apparent 
association between contact density and evolvability in 
proteins may be driven by co-correlation of structural mod- 
ularity to both contact density and evolvability. 

We rule out the possibility that differences in evolvability 
are due to differences in selection regime when we fail to 
find an association between structural modularity or robust- 
ness and protein functional importance (supplementary fig. 
S1 , Supplementary Material online). This finding means that 



we can exclude two possible alternative interpretations. The 
first is that, in the long term, robust protein folds — being 
more evolvable — end up being recruited into functional 
roles which demand high levels of evolvability because they 
are good at tolerating shifting selection pressures (in other 
words, the possibility that highly robust proteins are predis- 
posed to biological roles where adaptive changes are fre- 
quent and that protein robustness persists through 
association with these adaptive changes). This would con- 
stitute a mechanism of fold selection for evolvability (i.e., 
selection for the most evolvable fold, out of multiple distinct 
folds that can perform the same function) (Taverna and 
Goldstein 2000; England et al. 2003). The second alternative 
interpretation is that strong positive selection, which would 
be reflected as high levels of adaptive evolution, causes pro- 
teins to gradually evolve greater robustness. From a theoret- 
ical standpoint, this interpretation is unwieldy to begin with 
because contact density and helix/strand density, being in- 
herent features of the protein structure, cannot evolve effi- 
ciently through point mutations because distinct protein 
structures are separated in sequence space by vast distances 
composed almost entirely of unfoldable sequences (i.e., 
there is no shape space covering) (Babajide et al. 2001). 
Hence, one of the basic requirements for adaptive evolu- 
tion — that the trait can change in a quasi-gradual way — is 
not fulfilled by either helix/strand density or contact density. 
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We find that higher structural modularity and robustness 
are associated with greater variance between proteins in the 
evolvability index (fig. 6). We think this is most likely due to 
the fact that the evolvability index is the product of two var- 
iables, at least one of which (i.e., the proportion of sites under 
positive selection) is very likely to be binomially distributed and 
thus have increasing variance with increasing magnitude. This 
is because the proportion of sites under positive selection may 
simply be a sum of multiple independent Bernouli trials that 
each determine whether or not a given site is under positive 
selection), divided by the total number of sites. Accordingly, 
since the number of successful outcomes of any Bernouli trial 
is binomially distributed, if we specify P as the probability that 
any given site is under positive selection and n as the total 
number of sites in the protein, we can describe this proportion 
of sites under positive selection as a binomially distributed var- 
iable with an expected value P, and a variance np(1 - P)/n 2 . 
Thus, we expect that both the mean and variance will increase 
with Pso long as Pis less than 0.5 — which is certainly the case 
for the proteins in our data set. 

The Relationship between Protein Structural 
Modularity and Robustness 

There are some other minor conclusions that can be drawn 
from this work. By quantifying both protein structural 
modularity and robustness, we have the opportunity to 
address how these two variables relate to one another. 
The exact relationship between them has not been thor- 
oughly investigated in real proteins. All that is known is 
that, for lattice models, mutationally robust "prototype" 
sequences are characterized by an overrepresentation of 
special sequence motifs that fold in a context-insensitive 
manner — reminiscent of "folding modules" (Cui et al. 
2002). Also, Li et al. (2007) show that modular "stabilizing 
fragments" can be recombined to create highly robust chi- 
meric proteins. Lastly, for approximately factorizable net- 
works, theory shows that the mean clustering coefficient 
(which is an index for modularity) is determined by the het- 
erogeneity and density of the network. Though it is not clear 
whether proteins represent approximately factorizable net- 
works, single domain proteins do seem to fit their general 
profile. Network density is very related to contact density 
when it is applied to amino acid structural networks (Dong 
and Horvath 2007), so this could be what causes the corre- 
lation we find between contact density and helix/strand 
density. However, because we find that contact density does 
not tightly correlate with helix/strand density (fig. 7) and 
that helix/strand density has an effect on the evolvability in- 
dex that is independent of the effect of contact density, we 
conclude that structural modularity and structural robust- 
ness — at least as indexed here — describe somewhat differ- 
ent information. However, because they are also clearly 
intertwined, our results emphasize the importance including 
considerations of protein structural modularity in studies in- 



volving contact density and the value of developing meth- 
ods to quantify modularity in real biological systems. 

Robustness of Unstructured Proteins 

It is important to note that our indices for both modularity and 
robustness are structural and that structural constraints are on- 
ly good approximations of the overall constraints on adapta- 
tion where structure is essential for function. While this is true 
for many proteins, there are some important exceptions. For 
example, many transcription factor proteins only require struc- 
tural stability at a small fraction of their amino acids (Garza 
etal. 2009) and disordered regions often have important func- 
tional roles and conserved sequences (Marisco et al. 2010). 
Moreover, it has been hypothesized that proteins without 
a rigid structure achieve high robustness of function (despite 
essentially zero structural robustness) (Brown etal. 2002), flex- 
ibility of function for transient and specific interactions (Singh 
et al. 2007), and the ability to evolve through promiscuous 
functions (Wroe et al. 2007) — all of which contribute to higher 
evolvability. Our results indicate that structural constraints do 
not capture the relevant constraints on adaptation for some 
classes of proteins in our data set — specifically, those which 
are relatively unstructured (figs. 5, 8, and 10). Therefore, 
our results support the idea that, for some proteins, proper 
function is not directly dependent on structural stability, 
and in turn, that protein functionality cannot always be ap- 
proximated through measures of structural stability. This is sig- 
nificant in light of the common assumption within the field of 
structural biology that structure equals function. However, be- 
cause the great majority of proteins with solved structures do 
rely on an ordered structure to perform their functions, we did 
not think that these exceptions would cause enough of a prob- 
lem to warrant their exclusion from our data set. 

Future Research about the Determinants of Protein 
Evolutionary Rate 

There has been considerable research in the past several 
years aiming to identify the important determinants of 
protein evolutionary rate (c/ N or d N /d s ). For the reasons 
stated above, we believe that our evolvability index is 
fundamentally different from these measures of pred- 
ominantly neutral evolutionary change. Furthermore, in 
these studies about the determinants of evolutionary rate, 
c/ N or d N /d s is generally inferred from a comparison of only 
two species, whereas our evolvability index is inferred from 
a phylogeny of 25 species. Nevertheless, it is certainly pos- 
sible that constraints on neutral evolution to some extent 
translate to constraints on adaptive evolution. Therefore, 
we take into consideration the dominant factors determin- 
ing neutral evolutionary rate in order to verify that none of 
these are in fact responsible for our observed associations 
between indices of modularity, robustness, and evolvability. 
We do not find any of them to be confounding. Even though 
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gene expression level is a dominant factor determining pro- 
tein evolutionary rate in bacteria (Rocha and Danchin 2004) 
and yeast (Zhang and He 2005; Drummond et al. 2006), we 
are not concerned that it is a confounding factor in this 
study because it has only a negligible role in determining 
the evolutionary rate of mammalian proteins (Liao et al. 
2006; Vinogradov 201 0). In fact, because it has only recently 
been elucidated that the determinants of mammalian pro- 
tein evolutionary rate differ considerably from those deter- 
mining the rates in yeast and bacteria, our results are of 
interest in that they shed preliminary light on how protein 
structure plays a role in determining the rate of at least 
adaptive protein evolutionary change in mammals, and they 
raise the question of whether similar patterns would also be 
found in bacterial and fungal proteins. 

In summary, we conclude that proteins with high rates of 
adaptive evolution, and thus, high apparent evolvability, have 
higher helix/strand density and contact density than proteins 
with lower apparent evolvability and that this pattern is con- 
sistent with the idea that modular and/or designable folds — 
being less structurally constrained — accommodate adaptive 
changes at a higher rate than proteins with low structural 
modularity and robustness. Furthermore, we conclude that 
the effect of structural modularity on protein evolvability is 
independent of structural robustness and that it is therefore 
possible that structural modularity drives the relationship be- 
tween robustness and evolvability observed in proteins. 

Supplementary Material 

Supplementary figs. S1-S8 are available at Genome Biology 
and Evolution online (http://gbe.oxfordjournals.org/). 
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